Cloud Computing : Scalability- True or False
Despite the draft definition of NIST, National Institute of Standards and Technology, we continue to talk about Cloud Computing as if it were the technology capable of doing everything.
One of these much-mentioned features is the alleged innate scalability of the technology, so let’s start with a proper understanding of scalability first:
We can define it as the ability of a system to grow based on the number of requests external to the system itself. Generally it means horizontal growth, i.e. on “commodity hardware”, which is summarized by adding cheap servers to increase the power of the system and naturally balance its load. Actually it seems to fit with Cloud Computing which by definition would be a simply scalable infrastructure, based on “commodity hardware” and capable of delivering power on demand to the end user. Many of the solution architects will remember very well the complexities faced in being able to scale customer applications, and we also consider that most of the largest portals and social networks have developed their own logic to scale, logics born from the type of application.
When in 2007 we came across Amazon AWS services, it was while reading the site par excellence dedicated to scalability highscalability.com where an article was published that hypothesized the use of Amazon webservices for the development of social networks, then the term Cloud Computing had not yet been coined. As is now known, Cloud Computing is based on the virtualization of servers, storage and network services, so until not long ago the most experts misjudged the use of virtualization as individual virtual machines could not have the same performance as an equal configuration without a hypervisor layer, even today you can find contrary architects. It is not wrong as an objection, but the reasoning that leads to preferring the Cloud to the generic approach is the enormous economic savings and the simplicity and speed of procurement of resources, compared to the implementation of new hardware that also involves qualified personnel for configurations. If we then think that Amazon, the largest e-commerce in the world, works with an infrastructure equal to the AWS services it provides to customers, it also makes us reflect on the performance of a correct implementation in the Cloud. Amazon has matured over time and developed in-house what are the elastic logics of the Cloud that it sells to AWS customers, as can be seen in this video where Jeff Bezos, founder of Amazon.com describes Amazon Web Services which is a product born from Amazon’s laboratories to manage their infrastructure:
As is well known, Cloud Computing can provide services in three ways:
- SaaS
- Paas
- IaaS
In the first case, SaaS (Software as a Service), there is an application already developed and therefore already designed to scale into the underlying infrastructure. In this case, it is not certain that there is a Cloud underneath; this is the case, for example, with gmail, google documents, SalesForce.
In the second case PaaS (Platform as a service) there is a platform capable of hosting applications made by others, but these applications must meet the strict requirements; This is the case, for example, of Google App Engine which allows you to publish applications in Java and not worry if the traffic on these applications grows dramatically. Again, it is not always true that the substrate is a Cloud.
The last is IaaS (Infrastructure as a services), which represents a series of software interfaces (APIs, webservices) that can be used through authentication, with which it is possible to program the computational resources we need to run our application. Therefore, this layer is not scalable naturally, but we should be able to program it to scale our application; the clearest and most well-known example of IaaS is AWS (Amazon WebServices).
Solutions
Many companies are emerging and trying to establish themselves in the Cloud scalability market. All these realities are software developed on the lowest layer, software capable more or less of responding to overload events by balancing the load on new resources requested from the system automatically. Most of these software are positioned between PaaS and IaaS because they must allow a minimum of programmability / configuration to the user who cannot therefore be unfamiliar with the subject; one of the most representative cases is RightScale, which, however, in addition to being complex is also expensive.
RightScale creates a simpler layer than the brute programming of AWS API commands, through various cases already developed you can be guided in the creation of our virtual infrastructure in the Cloud, scalable according to the logic that we are going to define.
Today is the beta release of the (long-awaited by us, as it had been anticipated) AWS scalability solution. Amazon is launching the following webservices in beta today:
- CloudWatch
- ElasticLoadBalacing
- AutoScaling
These three services which, as is well known, are used through scripting, allow you to monitor AMIs (amazon machine images), to program a load balancer for our application and define groups and triggers to automatically intervene new resources to support the performance demands of our application.
The offer seems excellent, only the CloudWatch seems excessive in cost, monitoring 10 AMIs would cost more than the cost of an AMI on which we can install a monitoring system that can monitor many more.
Returning to the problem of scalability, there are therefore Cloud solutions to guarantee us worry-free development, but only good architects can raise the right exceptions, how do we scale a database?
DB Scalability
And it is here that we can say that Cloud Computing does not naturally allow you to scale but it is necessary to adopt a complex series of measures that must be evaluated on a case-by-case basis. By now it is known to the most experts that scaling a relational database of a social network, i.e. with an R/W balance of about 50%, is not easy and involves a whole series of approaches to distributing requests by dividing the database into many pieces (sharding), changes to the application and insertions of inMemoryDatabase (e.g. memcached) to preserve memory of the position of the pieces.
The most astute could say that we could switch to the modern solutions of inMemoryDataGrid, that is a sort of Cloud of RAM resources on which we spread our database. Well certainly the performance would be very rewarded, but the cost of a RAM resource compared to the HD resource is higher by the same order of magnitude than the more performing RAM Vs HD, moreover in addition to being uneconomical it is also overabundant as not all the data of our application is necessary to keep online all the time, because perhaps not requested by anyone.
Another very commercial, but not very functional trend is the private Cloud capable of scaling into the Public Cloud (Hybrid Cloud). First of all, we are still without a standard, so it should be developed ad hoc for each customer, according to the previous problem, if our need is the scalability of the database, how do we scale it from Private to Public? Considering that generally the bandwidth useful to connect a private CED with a datacenter where a Cloud resides is of the order of tens of Mbits in download and even less in upload. It would be necessary to activate replicas that could easily be out of sync.
Real customer needs
The customer generally requires simplicity, requires to move to the Cloud without any impact on his normal development, without upsetting the application, without abandoning the relational logic of his database, because they would constitute large costs, even replacements of personnel as they are not qualified or not suitable to rewrite the application to grow in the Cloud.
VMEngine is releasing its first release based on a simple approach to scalability, to the auto-scaling of the entire application, including database servers, and in the meantime it is doing research to make scalability more and more automated and more and more transparent to the customer who can return to develop without having to upset the know-how of their developers